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RELATED APPLICATIONS 
This application claims the benefit of provisional application 60/261 ,716 filed 
January 12, 2001. 

STATEMENT REGARDING FEDERAL RIGHTS 
This invention was made with government support under Contract No. W-7405- 
ENG-36 awarded by the U.S. Department of Energy to The Regents of The University 
of California. The government has certain rights in the invention. 

FIELD OF THE INVENTION 
The present invention relates generally to protein identification using mass 
spectrometry and, more specifically, to the stable isotope mass tagging of selected 
amino acids which are incorporated into proteins in a sequence-specific manner during 
cell culturing to enable protein identification from the characteristic patterns in the mass 
spectra of proteolytic peptides. 

BACKGROUND OF THE INVENTION 
Proteomics is a newly emerging field in the post-genomics era 1 . A major activity 
of proteomics is the identification of unique proteins in cellular complexes in a high 
throughput mode 2 . Peptide mass mapping followed by database searching is a major 
approach towards the identification of a protein using mass spectrometry (MS). Using 
this approach the measured and calculated masses of proteolytic peptides are 
compared for a best mass-fit to possible proteins 3,4 . The most commonly used method 
is an in-gel digestion of the protein spots separated by two dimensional polyacrylamide 
gel electrophoresis (2D PAGE) for analysis by matrix-assisted laser 
desorption/ionization time-of-flight (MALDI-TOF) MS 5, 6 . Mass accuracy and precision 
are of prime importance to ensure specificity of the search for a target protein in 
database searches. 

The mass-to-charge (m/z) ratios of a large number of proteolytic peptides 
covering much of the protein sequence must be precisely determined. Too few 
proteolytic peptides from a target protein in a MALDI-TOF MS spectrum reduces the 
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specificity and precision of the database search and can give false positives. Currently, 
the typical sequence coverage of a protein in a MALDI-TOF MS spectrum is less than 
40% 7 " 14 . This depends largely on sample availability 7 , sample preparation methods 8 , 
matrix solution conditions 9 , and matrix crystal morphology 10 , as well as the physical 
5 properties of proteins such as charged side chains 11,12 , peptide hydrophobicity 13 , and 
the potential to form stable secondary structures 14 . In most cases, MS data acquisition 
and interpretation have proven to be time-consuming in the identification of unique 
proteins in complexes because of problems such as low sample availability, background 
or artifact ions, mass degeneracy of peptides from protein impurities and post-synthetic 
10 modifications of proteins as examples 15 Ultrahigh mass accuracy provided by high-cost 
instruments is often required to determine the absolute m/z values of these proteolytic 
□ fragments 16,17 . To increase the specificity of identification of proteolytic peptides, the 
; 2 external labeling of the C-termini of tryptic peptides with H 2 0 containing 50% 18 0 during 
iy trypsin digestion has been used 18,19 . Although this is a useful method for excluding 
m 15 unrelated peaks from the data search, its selectivity and sensitivity is poor because only 
the C-termini of all tryptic peptides are labeled with 18 0. 

» » 

Q It is necessary to extend the limited resource of peptide signals available in 

L MALDI-TOF MS spectra for characterizing proteins by further increasing the specificity 

4 

\™ of proteolytic peptide identification. Stable isotope labeling; that is, the replacement of 
j ! U 20 13 C for 12 C, 15 N for 14 N, or 2 H for 1 H, in proteins or DNA oligomers can generate internal 
mass "signatures" with characteristic mass shifts in their isotopic distribution patterns 
without affecting their chemical and structural properties 20 . Uniformly 15 N-labeled 
proteins have been generated for the accurate MS-based quantitation of protein 
expression 21 and for improvements in the sensitivity and accuracy of molecular mass 
25 measurements 22 . 

Stable isotope 13 C/ 15 N-labeled nucleotides have successfully been incorporated 
as internal markers to determine the nucleotide composition of PCR products 23 . 

Accordingly, it is an object of the present invention to increase the specificity of 
mass spectrometric proteolytic peptide identification. 
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Additional objects, advantages and novel features of the invention will be set 
forth in part in the description which follows, and in part will become apparent to those 
skilled in the art upon examination of the following or may be learned by practice of the 
invention. The objects and advantages of the invention may be realized and attained by 
5 means of the instrumentalities and combinations particularly pointed out in the 
appended claims. 

SUMMARY OF THE INVENTION 
To achieve the foregoing and other objects, and in accordance with the purposes 
of the present invention, as embodied and broadly described herein, the method for 
10 identifying a protein hereof includes the steps of: separating the protein from other 
proteins; digesting the protein, thereby forming first proteolytic peptides; acquiring the 
q monoisotopic mass distribution spectrum of the first proteolic peptides and acquiring the 
m/z values therefor; incorporating an amino acid 100% labeled with a stable isotope into 

I. HXfll 

II 

W the protein in a sequence-specific manner; separating the protein bearing the labeled 
ijjj is amino acid from other proteins; digesting the protein bearing the labeled amino acid, 

i : "i 

thereby forming second proteolytic peptides; acquiring the monoisotopic mass 
Q distribution spectrum of the second proteolytic peptides and acquiring the m/z values 

* ii 

If 

tf M;«* 

j,* therefor; comparing the monoisotopic mass distribution spectrum of the second 
proteolytic peptides with the monoisotopic mass distribution spectrum of the first 

I'U 20 proteolytic peptides to determine the amino acid composition of the first proteolytic 
peptides and the second proteolytic peptides, whereby the protein is identified from the 
m/z values of the first proteolytic peptides and the m/z values of the second proteolytic 
peptides and the amino acid composition of the first proteolytic peptides and the second 
proteolytic peptides. The order in which the mass analysis of the labeled proteolytic 
25 peptides or the mass analysis of the unlabeled proteolytic peptides is performed is not 
important. 

Preferably, the step of incorporating the 100% labeled amino acid into the protein 
in a sequence-specific manner further includes the steps of: introducing the 100% 
labeled amino acid into a cell capable of expressing the protein; and inducing the cell to 
30 express the protein. 
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In another aspect of the present invention, in accordance with its objects and 
purposes, the method for identifying a protein hereof includes the steps of: incorporating 
an amino acid 100% labeled with a stable isotope into the protein in a sequence-specific 
manner at a variable number of the sites for that amino acid in the protein, forming 
thereby a mixture of partially labeled proteins; separating the mixture of partially labeled 
proteins from other proteins; digesting the mixture of partially labeled proteins, thereby 
forming proteolytic peptides; and acquiring the monoisotopic mass distribution spectrum 
of the proteolytic peptides and acquiring the m/z values therefor, whereby the protein is 
identified from the m/z values of the proteolytic peptides and the amino acid 
composition of the proteolytic peptides. 

Preferably, the step of incorporating the 100% labeled amino acid into the protein 
in a sequence-specific manner at a variable number of sites for that one amino acid in 
the protein, further includes the steps of: introducing the 100% labeled amino acid and a 
chosen amount of an unlabeled same amino acid into a cell capable of expressing the 
protein; and inducing the cell to express the protein. 

Benefits and advantages of the present incorporation of mass labels into specific 
proteolytic fragments significantly increase datasearch specificity, efficiency and 
accuracy for peptide sequencing and protein identification. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and form a part of the 
specification, illustrate the embodiments of the present invention and, together with the 
description, serve to explain the principles of the invention. In the drawings: 

FIGURE 1 shows delayed-extraction MALDI mass spectra of tryptic digests of 
the unlabeled UBL1. 

FIGURE 2a shows monoisotopic patterns of peptides at m/z of 896.67 Da (M + ) 
and 1001.75 Da (M + ) from tryptic digestion of (A) unlabeled UBL1; (B) Met-d 3 labeled 
UBL1; and (C) a mixture of the Met-d 3 labeled and unlabeled UBL1, FIG. 2b shows 
monoisotopic patterns of peptides at m/z of 896.67 Da (M + ) and 100175 Da (M + ) from 
tryptic digestion of: (A) unlabeled UBL1; and (B) a mixture of Gly-d 2 labeled and 
unlabeled UBL1, and FIG. 2c shows the characteristic isotopic patterns of the large 
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tryptic digest at m/z of 3644.88 (M + ) for; (A) unlabeled UBL1; (B) Met-d 3 labeled UBL1; 
and (C) a mixture of the Met-d 3 labeled and unlabeled UBL1. 

FIGURE 3a shows the PSD fragment ion mass spectra of the fragment of 
64 FLFEGQ 70 R containing unlabeled glycine residue, while FIG. 3b shows postsource 
5 decay fragment ion mass spectra of the fragment of 64 FLFEGQ 70 R containing the 
labeled glycine residue, Gly-d2. 

FIGURE 4a shows deiayed-extraction MALDI mass spectra of tryptic digests of 
50% Gly-d 2 labeled £ coli cell lysate, while FIG. 4b shows deiayed-extraction MALDI 
mass spectra of tryptic digests of 50% Met-d 3 labeled £ coli cell lysate. 
10 FIGURE 5 shows the deiayed-extraction MALDI-TOF spectrum of the tryptic 

digests of the complex of interacting proteins of UBL1 and UBC9. 
S DETAILED DESCRIPTION 

« 

i Briefly, the present invention includes the incorporation of stable isotope-labeled 

IjJ amino acid residue(s) in proteins to "mass-tag" some proteolytic peptides according to 

ijrj 15 their content of these labeled residue(s). Stable isotope labeling of proteins are specific 

I H for particular amino acid residues 24 " 26 . Particular labeled amino acid are incorporated 

:: 

into proteins during cell growth or in an in vitro transcription/translation system 26 in a 

« »< 

manner that provides residue-specific mass-labeled proteins without scrambling of the 
I;* label to other types of residues 24 . A comparison of the masses of the peptides 
m 20 generated from proteolytic digestion of the residue-specific labeled protein with those of 
an unlabeled control assists in identifying the mass-tagged peptides, because modern 
mass spectrometry, including MALDI-TOF MS, permits the accurate determination of 
these changes with monoisotopic resolution 27,28 . This provides an additional constraint 
of the amino acid identity of mass tagged peptides to enable accurate peptide 
25 identification. Furthermore, the magnitude of the mass shifts for peptides reflect the 
content of particular amino acid residue(s). A smaller number of identified mass-tagged 
peptides is then used for more effective protein identification. It should be mentioned 
that other mass spectrometers, such as electrospray mass spectrometers, can 
effectively be employed in accordance with the teachings of the present invention. 
30 Although partial amino acid sequences of selected peptides can be obtained 
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by postsource decay (PSD) experiments**' , many precursor ions obtained by delayed- 
extraction (DE) MALDI do not produce sufficient PSD fragmentation to allow the 
identification of even short sequence tags 30 . In accordance with the teachings of the 
present invention, the characteristic monoisotopic distribution pattern(s) of labeled 
amino acid residues provide internal marker(s) for the assignment of PSD derived 
peptides. Thus, the incorporation of mass labels into specific proteolytic fragments 
significantly increase datasearch specificity, efficiency and accuracy for peptide 
sequencing and protein identification. 

Having generally described the present invention, the following detailed 
description additional information. 

I. MATERIALS AND PROCEDURES: 

A. Chemicals: Stable isotope enriched amino acid precursors, L-Methionine- 

99.9%-d 3 (Met-d 3 ) and Glycine-99.9%-2,2-d 2 (Gly-d 2 ) were purchased from Isotec INC. 
(Miamisburg, Ohio). 

B. E. coli strains for residue-specific labeling of proteins: 21 strains of 
bacteria, each containing a different genetic defect closely linked to a selectable 
transposon marker were used to construct strains of £ coli with effective genotypes for 
residue-specific, selective labeling of proteins with almost any stable isotope-labeled 
amino acid. By using strains which have been modified to contain the appropriate 
genetic lesions to control amino acid biosynthesis, dilution of the isotope label by 
endogenous amino acid biosynthesis and scrambling of the label to other types of 
residues was avoided. Clearly other cell lines can be generated to perform the same 
task. 

1 . £. coli strain CT2 was constructed by transduction of the BL21 (DE3) strain 
to tetR with a P1 lysate from MF14, and then screening for the gly- 
phenotype 26 . This derivative of BL21(DE3) was used for the selective 
labeling of proteins with the stable isotope-labeled glycine. 

2, Similarly, CT13 was constructed by transducing BL21(DE3) to tetR with a 
P1 lysate from MF 21 , and then screening for the met- phenotype (metA-). 
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This metA- derivative of BL21(DE3) has the ideal genotype for selective 
isotope labeling with methionine. 
C. Residue-specific labeling of proteins and purification. The expression 
plasmid of UBL1 was transformed into both CT2 BL21(DE3) and CT13 BL21(DE3). 
5 According to the protocol given by Muchmore et al. 27 , the CT2 BL21(DE3) cells were 
grown in M9 minimum media supplemented with 0.2 g per liter of the L-Methionine- 
99.9%-d3, 0.02 g per liter of unlabeled cysteine, and 0.2 g per liter of each of other 
unlabeled amino acids. The CT13 BL21(DE3) cells were fed with a similar mixture that 
contained the labeled precursor, 0.2 g of Glycine-99.9%-2,2-d 2 . These cells were 
10 induced with 1 mM isopropylthiogalactoside (IPTG) for protein expression. It is clear 
that other amino acids than Methionine and Glycine can be labeled and used in 

□ accordance with the teachings of the present invention. Moreover, other inducing 
l 'i agents than IPTG can be employed. The corresponding unlabeled protein was 
\J expressed in regular LB media. The His-tagged proteins were purified in a buffer of 150 
5 15 mM ammonium acetate (NH 4 OAC), pH 7.0 with a gradient of 0 - 150 mM imidazole. 

;: n D. Tryptic digestion and MALDI-MS analysis. The protein samples were 

□ further desalted using C18 ZipTips (Millipore) and eluted with aqueous 50% acetonitrile 
Ci containing 0.1% TFA. After lyophilizing, the samples were resuspended in a buffer of 
jf 25 mM ammonium bicarbonate (NH 4 HC0 3 ), pH 8.0. The unlabeled protein was mixed 
!'U 20 with Met-d 3 - or Gly-d 2 -labeled proteins in a variety of molar ratios. Trypsin (Boehringer 

Mannheim) was added in the final concentration of 10 ng/ml and the mixture was 
incubated for 1 h or 16 h at 37 °C respectively. For mass spectrometry analysis, 1 jnl of 
sample was mixed with 1 \i\ of a matrix solution (10 mg/ml) of ot-cyano-4- 
hydroxycinnamic acid which was prepared by dissolving 10 mg in 1 ml of aqueous 50% 
25 acetonitrile containing 0.1% trifluoroacetic acid (TFA). 

Mass spectrometry experiments were carried out on a PE Voyager DE-STR 
Biospectrometry workstation equipped with a N 2 laser (337 nm, 3-ns pulse width, and 
20-Hz repetition rate) in both linear and reflectron mode (PE Biosystems, Framingham, 
MA). The mass spectra of the tryptic digests were acquired in the reflectron mode with 
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delayed extraction (DE). The m/z values of proteolytic peptides were calibrated with 
Calimix 2 including Angiotensin I at 1297.51 Da (M + ) and Insulin at 5734.59 Da (M + ). 

E. Mass tagging in an E. coli strain and the target protein identification. 
The E coli BL21(DE3) cell strain containing the UBL1 expression vector was cultured in 
M9 media supplemented with a mixture of amino acids including 50% labeled amino 
acid precursors (Gly-d 2 or Met-d 3 ) respectively. The cells were then induced with 1 mM 
IPTG. An aliquot of the cell culture was collected 30 min. after the IPTG induction when 
the target protein did not overwhelm the proteins in the total cell extract. After 
centrifugation of the cell aliquot, the resulting pellet was resuspended and sonicated in a 
buffer of 1 mM DTT and 20 mM NH 4 HC0 3 at pH 8.0. The supernatant of the cell extract 
was treated with trypsin (10 \ig/m\) overnight without purification. The cell extract 
containing the tryptic digests was then desalted by C18 ZipTip (Millipore) and analyzed 
using MALDI-TOF MS. 

F. Mass tagging for a complex mixture and MALDI-MS analysis. E. coli 
BL21(DE3) cell strains containing the UBL1 and UBC9 expression vectors were mixed 
in the same copy numbers and grown in M9 media supplemented with a mixture of 
amino acids that included 50% deuterium-labeled glycine (Gly-d 2 ). Both UBC9 and 
UBL1 were readily expressed and labeled with Gly-d 2 at all glycine residues in the E 
coli strains upon IPTG induction. The cell pellet was resuspended, sonicated and lysed 
in a buffer of 1 mM DTT and 20 mM NH 4 HC0 3 at pH = 8.0. The Pharmacia Biotech 
FPLC with a gel filtration mini-column (Superdex 75, 1.0 cm X 10 cm, Pharmacia 
Biotech) was used to isolate the complex of UBL1 and UBC9 from the cell lysate. The 
same buffer of 1 mM DTT and 20 mM NH4HCO3 at pH = 8.0 was used for the protein 
elution. The fraction containing the complex was lyophilized and then treated with 
trypsin (10 ^ig/ml in 10 mM NH4HCO3, pH 8) overnight. 

G. Post-source decay (PSD) Measurements 29,30 , PSD fragment ion 
spectra were acquired for those peptides containing the labeled amino acids after 
isolation of the appropriate precursor ion. Fragment ions were refocused onto the final 
detector by stepping the voltage applied to the reflectron in the following ratios: 1.0000 
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(precursor ion segment), 0.9126, 0.8000, 0.7000, 0.6049, 0.4125, 0.2738, 0.1975, 
0.1213, and 0.0900. 
II. RESULTS: 

A. Identification of the tryptic fragments containing stable isotope- 
5 labeled amino acids. 

The TABLE lists the theoretical m/z values and sequences of peptides generated 
by tryptic digestion of the ubiquitin-like protein, UBL1 31 . Partially 2 H(d)-labeled glycine 
and methionine residues, which are widely distributed in the protein, were incorporated 
as the labeled precursors for mass signatures of certain peptides in the protein. Two 
10 residue-specific labeled versions of UBL1, designated UBL1-Met-d 3 , and UBL1-Gly-d 2 
were generated. The protein, UBL1-Met-d 3 , was extracted from E.coli strain BL21(DE3) 
h CT13 cells transformed with the UBL expression vector and had the 2 H-labeled 
^ precursor, methionine-99.9%-S-methyl-d 3 (Met-d 3 ), incorporated at all of the methionine 
f'y sites of the protein. Similarly, the glycine-specific labeled protein, UBL1-Gly-d 2 , 
jfl 15 extracted from E.coli BL21(DE3) CT2 cells, had the 2 H-labeled precursor, glycine- 
u? 99.9%-2,2-methene-d 2 (Gly-d 2 ), incorporated at all glycine sites. Thus, for peptides 
□ containing Met-d 3 or Gly-d 2 there was a 3 or 2 Da mass increase per methionine or 

ti «■ 

[ 2 glycine residue, respectively, relative to their unlabeled counterparts. 

■ 

|;* Reference will now be made in detail to the present preferred embodiments of 

I'D 20 the invention which are illustrated in the accompanying drawings. FIGURE 1 shows the 
mass spectrum obtained from a tryptic digest of the unlabeled UBL1. The PE Voyager- 
DE STR MALDI-TOF MS has a mass resolution, M/AM, of 5000 which is sufficient to 
resolve monoisotopic peaks of all the tryptic peptides of masses up to 5000 daltons 
(Da). Inset A shows an expanded view of the monoisotopic distribution pattern 

25 corresponding to the relative abundance of isotopes, M + :(M+1) + :(M+2) + (M refers to 

the mass corresponding to the most abundant isotope) of a small tryptic peptide with a 
m/z value of 896.67 Da (M + ion). As the number of atoms increases, the less abundant 
isotopes such as 13 C, 15 N or 2 H also increase, so that at a higher m/z the isotopic 
pattern is more pronounced as shown in inset B (the m/z of M + ion is at 3644.91 Da). 
30 Ion fragments having m/z values of 1895.39, 2198.66, 2275.92, 2614.04 and 3155.54 
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probably derive from incomplete digestion and impurities were not assigned to the 
protein. 

For a given monoisotopic distribution pattern of the peptides, particular fragment 
ions containing the labeled precursor(s) shift in mass with respect to the unlabeled 
5 control. FIGURE 2a shows the MALDI-TOF mass spectra of two proteolytic peptides 
from: (A) unlabeled UBL1; (B) UBL1-Met-d 3 ; and (C) a mixture of (A) and (B) in a 1:2 
ratio. It was observed that the monoisotopic M + ion at 1004.85 Da from UBL1-Met-d3 
(B) was 3 Da heavier than that of the unlabeled UBL1 (1001.75 Da) (A) because of the 
presence of the labeled methionine (FIG. 2a(A)). By contrast, no peak shift was 
10 detected for the M + ion at 896.67 Da also from the Met-d 3 labeled protein (FIG. 2a(B)). 
For (C), a pair of monoisotopic peaks separated by 3 Da between M* ions of 1002.15 
q Da and 1005.17 Da was observed (FIG. 2a(C), right trace) but not at 896.67 Da (FIG. 
j 3 2a(C), left trace). The ratio of the intensities of the upper and lower M + mass ions; that 
Uj is, the ratio of the labeled and unlabeled proteins is approximately 2:1. Because the 
■~ 15 mass tag of a labeled methionine residue (Met-d3) is 3 Da, there is one Met residue in 
W the peptide at 1001.75 Da (M + ion) and none in the peptide at 896.67 Da (M + ion). 

»» 

□ Thus, the 3-Da mass split pattern is characteristic for Met-d3-tagged peptides of the 
protein. It may also be noted that the monoisotopic distribution patterns of these 
H i: labeled peptides are essentially unchanged when compared to the unlabeled peptides, 
ill 20 This is because only a few protons are replaced by deuterium in the labeled precursors. 

FIGURE 2b shows monoisotopic patterns of peptides at m/z of 896.67 Da (M + ) 
and 1001.75 Da (M + ) from tryptic digestion of: (A) unlabeled UBL1; and (B) a mixture of 
the Gly-d 2 labeled and unlabeled UBL1 (2:1 molar ratio). The incorporation of a Gly-d 2 
label can be recognized by the 2-Da split between the monoisotopic peaks of the 
25 unlabeled and labeled peptides. A pair of monoisotopic peaks separated by 2 Da with 
an intensity ratio of approximately 2:1 (upper to lower mass components) was observed 
in the m/z ranges of 896.67-898.66 and 1001.75-1003.76, for approximately 60% Gly- 
labeled UBL (UBL1-gly-d2). This corresponds to one Gly residue in each of the 
peptides (Figure 2b(B)). In this case, the fragment ion of 896.67 Da (M + ion) has one 
30 Gly and no Met, and the tryptic fragment of 1001 .75 Da (M + ion) contains both a Gly and 
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a Met residue. The characteristic mass-split pattern (with 2 or 3 Da spacing) for the 
immediate recognition of mass-tagged peptides of UBL1 containing the labeled 
precursor(s) is thus established. In comparison with the theoretically calculated m/z 
values listed in the TABLE, these two fragments were identified as 64 FLFEGQ 70 R and 
5 55 QGVPMNSL 63 R, respectively. Although the fragment of 71 IADNHTPK has a similar 
m/z value of 895.46 Da for the fragment 64-70, no mass tag or split was observed for 
this fragment in either mixture. The presence or absence of internal mass tags 
therefore can readily distinguish between these two peptides. 

FIGURE 2c shows the characteristic isotopic patterns of the large tryptic digest at 
10 m/z of 3644.88 (M + ) for: (A) unlabeled UBL1; (B) Met-d 3 labeled UBL1; and (C) a 
mixture of the Met-d3 labeled and unlabeled UBL1 (2:1 molar ratio). The incorporation 
of a Met-cb label can be recognized by the 3-Da split between the monoisotopic peaks 

- 

of the unlabeled and labeled peptides. Changes in isotopic distribution patterns (FIG. 

**** 

u 2c) were also observed for the larger fragment ions of 3644.88 Da (M + ion) and 4521 .65 

I 5 

li 15 Da (M + ion) (incomplete digestion product, data not shown). For large fragments, the 

M number of monoisotopic peaks increase in proportion to the number of atoms. A mass 

3 shift of 3 Da with respect to their unlabeled control was observed for both fragment ions 

/■ 

in the digestion product of UBL1-Met-d 3 (Compare FIGS. 2c(A) and 2c(B) for the 
!! 3644.88 Da fragment ion). FIGURE 2c(C) shows the mass spectrum of a mixture of the 
U 20 unlabeled and Met-d 3 labeled peptide of 3644.88 Da. Similarly, a mass shift of 6 Da 
was observed (data not shown) for both peptides of m/z values of 3645.10 Da and 
4521.99 Da for Gly-labeled UBL1 (UBL1-Gly-d 2 ) which implies three Gly-d 2 in both 
peptides. The peak set at 4521.99 Da (M + ion) was observed to diminish with longer 
digestion times (overnight at 37 °C). This is consistent with a peptide resulting from an 
25 incomplete digestion product. The difference in mass between these two peaks 
(3645.10 and 4521.99 Da) is 876.89 Da which is close to the m/z value of the fragment 
71 IADNHTPK (M + = 895.46) minus the mass of a water (H 2 0) molecule. Because the 
M + fragment ion at 4521.99 Da displays the same mass tag and isotopic distribution 
pattern as the fragment ion at 3645.10 Da, it is clear that both peptides contain one Met 
30 residue and three Gly residues and share a common segment. Thus, the fragment of 
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4521.99 Da is from the incomplete digestion of the last two fragments at the C-terminal 
of the protein; that is, 71 IADNHTP 78 K and 79 ELGMEEEDVIEWQEQTGGHSTVL- 
EHHHHH 107 H (bold type indicates the labeled Gly, while the labeled Met is underlined). 
The hydrolysis of the fragment of 71-107 results from the addition of a water molecule at 

5 C-terminal of the fragment of 71-78 to form the fragments 71-78 (the M + ion at 895.46 
Da) and 79-107 (the M + ion at 3645.10 Da). It also suggests that the tryptic site of 78 Lys 
linking the two peptides of the 71-78 and 79-107 is probably located in the core of the 
protein and partially shielded from tryptic digestion. This observation is consistent with 
the results of NMR studies of UBL1 indicating that 78 Lys is included in an a-helical 

10 segment 32 . This is an example of the use of mass tags to indicate possible secondary 
structure of a protein. 

□ B. Internal isotopic markers for highly selective peptide sequencing 
l i using post-source decay (PSD) 29,30 . 

W As illustrated above, these stable isotope-labeled residues in proteolytic peptides are 

t ? ■* 

in 15 useful indicators of the amino acid composition of mass-tagged peptides. In addition, 
the characteristic mass-split pattern can further serve as internal markers in the PSD 

□ spectra to obtain detailed sequence information on mass-tagged peptides from a 
L protein. FIGURE 3a shows the PSD fragment ion mass spectrum of the fragment of 
|!j 64 FLFEGQ 70 R containing unlabeled glycine residue. The insets show expanded views 

» « 

r 

! U 20 of the monoisotopic peaks of smaller PSD fragment ions in the m/z range of (A) 300- 
350 Da, and (B) the precursor ion, M + = 896.60. It is to be noted that there is no 
immediate information concerning residue assignment in the spectrum even using the 
PSD tool box in the software of the PE MALDI-TOF MS instrument. This is due to the 
complexity of the fragmentation pattern. Many low-intensity precursor ions produced by 

25 delayed-extraction MALDI do not yield enough PSD fragmentation to allow the 
derivation of even short sequence tags. To demonstrate the use of labeled amino acid 
precursors for rapid peptide sequencing, a peptide fragment containing 50% of the 
labeled residue, Gly-d 2 , was selected for PSD experiments. FIGURE 3b shows the 
PSD fragment ion mass spectra of the fragment of 64 FLFEGQ 70 R containing 50% 

30 labeled glycine residue, Gly-d 2 . The insets show expanded views of the monoisotopic 
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peaks of smaller PSD fragment ions in the m/z range of (A) 300-350 Da, and (B) the 
precursor ion, M + = 896.604. The M + ion of 50% Gly-d 2 at 896.67 Da (Figure 3b, inset 
B) was selected as a PSD precursor because the characteristic mass-split pattern 
indicates the location of the labeled glycine residue in the progressively produced 
fragment ions through PSD. The gate width was adjusted for the full isotopic 
distribution pattern of the PSD fragments. For smaller PSD fragment ions in the m/z 
range of 300-370 Da, several peak sets with the characteristic mass-split pattern of the 
partially Gly-d2-labeled fragments were immediately observed (Figure 3b, inset A). The 
a-17/a/b-17/b cursor available in the PSD tool box was applied to verify that the Gly 
containing b ion was at 343.27 Da. The determination of the b ion is a critical step for 
the residue assignment in peptide sequencing using PSD. This identified b ion was 
then used as an internal marker to trace the neighboring amino acid residues. 67 Glu, 
and 69 Gln have been identified as the closest amino acids to the Gly-d 2 , and the peak of 
343.27 Da was assigned to the fragment ion of 67 EGQ. From this core residue of 68 Gly- 
d 2 , the sequence of the M + fragment of 896.67 Da has been determined. 

C. Identification of UBL1 in an E. coli ceil extract. The mass-tagged 
peptides of UBL1 in the proteolytic digests of a protein extract from E. coli were also 
identified. FIGURE 4a shows the delayed-extraction MALDI mass spectra of tryptic 
digests of the cell lysates for the 50% Gly-d 2 labeled E. coli cell lysate, while FIG. 4b 
shows that for the 50% Met-d 3 labeled E. coli cell lysate. The peaks at 896.67 Da (M + 
ion) and 1001.75 Da (M + ion) each with the characteristic 2 Da mass-split were clearly 
observed and result from Gly-d 2 labeling of UBL1 in the presence of tryptic peptides 
from the cellular proteins. A 3 Da mass-split was found for the peak of 1001.75 Da 
(M + ), but not for the peak of 896.67 Da (M + ) when 50% Met-d 3 was used as the labeling 
precursor. Thus, these two specific UBL1 peptides indicate the presence of UBL1 in the 
cell extract. 

D. Identification of individual proteins in a complex mixture. It is known 
that the UBL1 interacts with the ubiquitin-conjugating enzyme (UBC9) during DNA 
double-strand break repair 33 . To demonstrate the use of the method of the present 
invention for unique protein identification in a complex mixture, both proteins in E. coli 
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cells and identified mass-tagged peptides from each of these two proteins have been 
specifically labeled. These mass-tagged peptides characterized by their m/z values and 
partial amino acid composition are considered to be the fingerprints of these proteins. 

Figure 5 shows the MALDI-TOF spectrum of the tryptic digest of the complex 
which shows the peak pairs with 2 x n Da mass-split ("n" represents the number of 
glycine residues) with about a 1:1 intensity ratio resulting from specific-labeled glycine- 
containing peptides. Three such characteristic peak pairs have been observed in the 
mass spectrum from the pool of tryptic digests. They are the peak pairs at 896.67 Da 
(M + ion) and 1001.75 Da (M + ion) each with the characteristic 2 Da mass-split, and a 
pair of M + ions at 1092.25 Da and 1098.31 Da with a 6 Da in spacing. The former two 
peak sets are mass-tagged peptides of UBL1 protein. The latter pair indicates that the 
fragment ion contains three glycines. The matched peptide is the GTPWEGGLFK (the 
theoretical m/z value of the M + ion is 1091.55 Da) of UBC9 protein. To confirm the 
assignment, the ratio of unlabeled to labeled amino acid precursors was varied. The 
change of the relative intensity of 1092.25 Da (M + ion) to 1098.31 Da (M + ion) was in 
agreement with this assignment. Therefore, not only from their matched m/z values, but 
also from their amino acid compositions, the above assigned peptides provide 
"fingerprints" for the identifications of UBL1 and UBC9. 

III. DISCUSSION: 

A. Mass-tag measurements are relative and more accurate. 

Mass calibration was performed externally using the calibration standard, Calmix 
2 (PE Biosystem). Typical observed mass errors were +0.2 to ±0.4 Da compared to the 
theoretically calculated masses for most peptides, which is expected for routine MALDI- 
TOF measurements. The use of absolute m/z values of measured peptides with such 
large errors (about 250 ppm) in database searching can result in the identification of a 
number of proteins other than the target protein. An advantage of the mass-tagging 
method of the present invention is that the mass of the tags requires only relative 
measurements; that is, the mass difference between the labeled and unlabeled 
peptides. For example, whereas the absolute m/z value of an ion peak is in error by 0.4 
Da in the spectrum of FIG. 2a (A) (1001.75) when compared to FIG. 2a (B) (1002.15), 
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the mass tag of 3 Da difference was accurately determined for a mixture of the labeled 
and unlabeled peptide (FIG. 2a (C)). Therefore, relative mass tag measurements 
reduce the demand for ultrahigh precision in the absolute m/z values of proteolytic 
fragments, which is currently required for protein database searching. More importantly, 

5 because mass tag measurements are relative, the identification of mass-tagged 
peptides will also be free of uncertainties from functional post-translation modifications 
and chemical modifications resulting from chemical reactions during polyacrylamide gel 
electrophoresis. The signals from mass-tagged peptides can be corroborated by 
changing the relative ratio of the labeled to unlabeled amino acid precursors. 

10 B. Mass tagging provides another parameter for unique protein 

identification. 

r 

□ After separation of a protein complex by 2D PAGE 3 ' 5,6 , individual spots often 

contain several proteins which complicates protein assignments from proteolytic 

:: 

r 

UJ digests. However, mass tagging with particular amino acids provides some amino acid 

I j « 

ilp! 15 composition data on the labeled peptides that can be used as an additional constraint 
for the m/z values used to identify these peptides. Experimentally, mass tagged 
0 peptides can easily be distinguished from a pool of peptides by their characteristic 
j lA mass-splitting patterns. The magnitude of the mass tags that are correlated with the 

i 
« 

^ partial amino acid composition of peptides in data searches allows the identification of a 
fy 20 target protein from only a few mass-tagged peptides in the digest pool. It is also noted 
in the TABLE that there are several tryptic fragments of UBL1 (that is, 730.39, 738.37, 
and 1750.78 Da) that are either too weak to be of use, or missing from the mass 
spectrum. These missing peptides however become less significant for protein 
identification as long as other mass-tagged peptides can be identified in a residue- 
25 specific manner. 

C. Implications of the site-specific labeling technique for proteome 
identification. 

The present method is also generally applicable for the identification of unique 
proteins in a complex. Residue-specific labeling in E. co//-expressed proteins using 
30 genetically engineered E. coli cell strains has been demonstrated. We have also 
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examined isotopic scrambling of the residue-specific labeling of the protein, UBL1, with 
proteins of the E. coli BL21(DE3) cell host. In the M9 media enriched with the 20 
amino acids, the stable isotope enriched amino acids, L-Methionine-99.9%-d 3 (Met-d 3 ) 
and Glycine-99.9%-2,2-d 2 (Gly-d 2 ), were used as the mass-tagging precursors for the 

5 methionine and glycine sites respectively. Negligible scrambling of the labels to other 
types of residues was observed for the short growth period. By taking up amino acids 
directly from the Minimum Essential Media 34 supplemented with a high concentration 
of all 20 amino acids including labeled precursors, proteins expressed in mammalian 
cells can also be labeled with specific amino acid(s). Within an appropriate growing 

10 time, all proteins expressed in the media will be mass-tagged in those segments 
containing the labeled amino acids. 

■ I 

Residue-specific mass tagging is particularly useful for the direct analysis of large 
protein complexes, when a denatured and reduced protein complex is first digested to 

::i=K 

UJ peptide fragments in a sequence-specific manner, followed by liquid chromatography 
15 separation and MS analysis 4 . The experimentally measured m/z values of mass-tagged 

I - h ,* 

i,f 1 peptides can be compared with the calculated m/z values of a proteolytic peptide library 
□ derived from the predicted digestion of proteins translated from the genomic sequence 
!T databases. The mass-tagged peptides identified from the matches will be selected for 
j;* the search and identification of unique proteins present in the translated genomic 
!ij 20 databases. Because the mass tags in different proteins are sequence-specific and 
correlated with their amino acid composition, this process will help resolve the mass 
degeneracy arising from peptides with the same m/z values. In our data bank, both the 
m/z values of peptides and the mass tags of certain peptides can be utilized in selective 
database searches for the unique identification of different proteins in complex mixtures. 
25 The specificity and accuracy of protein identification will be significantly increased by 
this analytical methodology of residue-specific mass tagging. 

The foregoing description of the invention has been presented for purposes of 
illustration and description and is not intended to be exhaustive or to limit the invention 
to the precise form disclosed, and obviously many modifications and variations are 
30 possible in light of the above teaching. The embodiments were chosen and described 
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in order to best explain the principles of the invention and its practical application to 
thereby enable others skilled in the art to best utilize the invention in various 
embodiments and with various modifications as are suited to the particular use 
contemplated. It is intended that the scope of the invention be defined by the claims 
appended hereto. 



17 



S-94,799 



REFERENCES: 

1. W.P. Blackstock and MP Weir, Trends in Biotech. 1999,17, 121-127. 
5 2. J.R. Yates, J. of Mass Spectrom. 1998, 33, 1-19. 

3. G. Neubauer et al., Nature Genetics 1998, 20, 46-50. 

4. A. Link et al., J. R. Nature Biotech 1999, 17, 676-682. 

10 

5. P. Chaurand etal., J. of Am. Soc. for Mass Spectrom. 1999,10, 91-103. 

6. A. Shevchenko et al., Proc Natl. Acad. Sci. USA 1996, 93, 14440-14445. 
15 7. C. Scheleret al., Electrophoresis 1998, 19, 918-927. 

I f 8. M. Kussmann et al., J. of Mass Spectrom. 1997, 32, 593-601. 
•3 9. S.L. Cohen and B.T. Chait, Anal. Chem. 1996, 68, 31-37. 

;t 20 

; S 10. F. Amado et al., Rapid Commun in Mass Spectrom. 1997,11, 1347-1352. 
i'rj 11. Y.F. Zhu et al., Rapid Commun in Mass Spectrom. 1995, 9, 1315-1320. 

i; 

□ 25 12. E. Krause et al., Anal. Chem. 1999, 71, 4160-4165. 

j; ?ftt 

13. Z. Olumee et al., Rapid Commun in Mass Spectrom. 1995, 9, 744-752. 

« u 

» «• 
*. 

• 

!;rj 14. H. Wenschuh et al., Rapid Commun in Mass Spectrom. 1998, 12,115-119. 

15. P.M. Rudd et al., Biochemistry 1994, 33,17-22. 

16. M. Wang and A.G. Marshall, Anal. Chem. 1989, 61,1288-1293. 
35 17. B. vandenBerg et al., J. Mol. Biol. 1999, 290,781-796. 

18. K. Rose et al., Biochem. J. 1983, 215,273-277. 

19. J. Qin et al., Rapid Commun in Mass Spectrom. 1998, 12,209-216. 

20. A. Ono et al., Stable Isotope Applications in Biomolecular Structure and 
40 Mechanisms (Ed. J. Trewhella et al.) (Los Alamos Natl. Lab., New Mexico). 

21. Y. Oda et al., Proc Natl. Acad. Sci. USA 1999, 96, 6591-6596. 

22. P.K. Jensen et al., Anal. Chem. 1999, 71,2076-2084. 



18 



S-94,799 



23. X. Chen et al., Anal. Chem. 1999, 71,3118-3125. 

24. D.S. Waugh, J. Biomol. NMR 1996, 8,184-92. 

25. D.C. Muchmore et al., Methods in Enzymology 1989 , 177,45-71 . 

26. T. Yabuki et al., J. Biomol NMR 1998, 1 1 ,295-306. 

27. F. Hillenkamp et al., Anal. Chem. 1991, 63,1193A-1203A. 

28. O.N. Jensen et al., Rapid Commun in Mass Spectrom. 1996,10,1371-1378. 

29. R. Kaufmann et al., Rapid Commun in Mass Spectrom. 1996 10, 11 99-1 208. 

30. T. Keough et al., Proc. Natl. Acad. Sci. USA 1999, 96,7131-7136. 

31. Z. Shen etal., Genomics, 1996, 37,183-186. 

32. P. Bayer et al., J.Moi.Biol. 1998, 280,275-286. 

33. Q. Liu et al., J. Biol. Chem. 1999, 274,16979-16987. 

34. Gibco BRL products & reference guide 2000-2001 pp 1-1 - 10-1. 



19 



